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Abstract: A discretization scheme for variable coefficient elHptic PDEs in the 
plane is presented. The scheme is based on high-order Gaussian quadratures 
and is designed for problems with smooth solutions, such as scattering problems 
involving soft scatterers. The resulting system of linear equations is very well 
suited to efficient direct solvers such as nested dissection and the more recently 
proposed accelerated nested dissection schemes with 0{N) complexity. 

1. Introduction 

1.1. Background. This note describes some tentative ideas for how to discretize and solve a class 
of variable-coefficient elliptic PDEs with smooth solutions. The ultimate goal is to device efficient 
methods for (soft) scattering problems in the plane, modeled by the Helmholtz' equation 

^2 

-A(/-(x) - ^--5-0(2;) = 0, 

where c(x) is a smooth function that is constant outside some domain il, and the "loading" of the 
system is an incoming wave that satisfies the Helmholtz equation for the constant value of c outside 

n. 

The method described is high-order accurate and and leads to a linear system of algebraic equations 
that is very well-suited to "nested-dissection" type direct (as opposed to iterative) solvers, hi a simple 
implementation, the resulting solver has 0{N^'^) complexity, where N is the total number of degrees 
of freedom in the discretization. We believe that the cost of the nested dissection step can be further 
reduced to 0{N) by exploiting techniques similar to those of [HO [3]. 

hi this initial work, we make several simplifying assumptions in order to investigate the basic 
viability of the method. The most significant simplification is that instead of studying the Helmholtz 
equation (which has oscillatory solutions), we study the modified Helmholtz equation (which has 
non-oscillatory solutions the decay exponentially fast). The remaining simplifications are, we believe, 
mostly cosmetic. 

1.2. Problem statement. We consider the equation 

^ ( -V{a{x)V(t>{x)) + b{x)(t>{x) = 0, xen, 

\ 4>n{x) = v{x), X ev, 

where ^2 is a box in the plane with boundary T = dil, and where (pn is the normal derivative of (j). 
We assume that the functions a and b are C°°, that 6 > 0, and that for every x G Q, a{x) is positive 
definite. Under these assumptions, the solution (p and its gradient Vc/) will be C°° in the interior of 
the domain and can to very high accuracy be specified via tabulation at Gaussian quadrature nodes. 
Near the boundary T, the question of smoothness in general gets complicated, but in this preliminary 
report we sidestep this issue by assuming that the given boundary data v is such that the solution 
(p is C°° on the closed domain (Ultimately, the technique will be applied to scattering problems 
and the function v will be the restriction to T of the "incoming wave.") 

1.3. Outline of the discretization scheme. We propose to tessellate the computational domain 
into a large number of small squares and then use the fluxes across the boundaries of each square 
as the unknown variables in the model. The flux is represented as a function along the edge; since 
this function is smooth, it can very accurately be represented by simply tabulating it at Gaussian 
nodes along the edge. We observe that if the fiuxes through all four edges of a box are known, then 



2 



the values of the potential (j) on all of the edges can be constructed via the so called "Neumann-to- 
Dirichlet" (N2D) operator for the box. These operators can cheaply be constructed for all the boxes 
via a local computation. Once the N2D operators for all boxes are known, we construct for each 
edge an equilibrium equation by combining the N2D operators of the two boxes that share the edge. 
By combining the equilibrium equations for all interior edges, we obtain a global equation for all the 
interior boundary fluxes. Once this global equation has been solved, the potential on any box can 
easily be reconstructed by solving a local Neumann problem on the box (since the boundary fluxes 
are now all known). 

1.4. Outline of the linear solver. The discretization described in Section 11.31 results in a large 
sparse linear system with a coefficient matrix A. If there are A'^edge edges in the model, and we 
place A'gauss interpolation nodes at each edge, then A is a block matrix consisting of iVodgc x -^cdgo 
blocks, each of size iVgauss x -^gauss- By ordering the edges in a nested dissection fashion, fill-in can 
be limited in the factorization of A, resulting in an 0{N^^^^ total cost for the initial solve. (Once 
one factorization has been executed, subsequent solves require O(iVedge) operations.) 

The dominant cost in the factorization described above is the inversion or factorization of dense 

1 /2 1 /2 

matrices of size roughly -A'^dge ^ "^cdgc' These matrices have internal structure (they are so call Hier- 
archically Semi- Separable (HSS) matrices) which can be exploited to further reduce the complexity 

to O(iVcdge). 



2. Discretization 

We tessellate Q. into an array of small boxes, and let {r'-^^jiG/icavcs denote the collection of edges of 
these boxes, see Figured) We include both interior and exterior edges. For an edge i, we define it^*-* 
as the restriction of to P^*-*: 

n(*)(2;) = (/)(x), forxGF^*). 
Further, we define i;^*^ as the restriction of the normal derivative across F*^*): 

[92'/'] (3;) for X G F*-*^ when F*-*^ is horizontal, 
[di(j)\{x) for X G F*-*^ when F^*-* is vertical, 
where we used the short-hand di = d/dxi. 



v^-^{x) 



Observation: All the functions u^*^ and u*-*^ are smooth. They can to very high accuracy be specified 
by tabulating them at Gaussian points on the boundary and then interpolate between these points. 



On each line F'^*^ we place A'g 

J^iVgauss > 



• gauss Gaussian nodes. These points are collected in vectors 7^*-* G 
= Then we form vectors u^*^!)^*) G M^g^"=s by collocating the boundary functions u^*) and 
at the Gaussian nodes: 



u 



(0 



u 



;«(7« 



3. The equilibrium equations 

3.1. Definition of the Neumann-to-Dirichlet operator. Let il^^^ be a subdomain of with 
edges F(^i), F^), F^), F^**), as shown in Figure [21 We define the boundary potentials and boundary 
fluxes for Jl^"^) via 



u 



ir) 



^(i4) 



and 



yiu) 



Then there exists a unique operator T^'^^ such that with n^"^-* and w^'^-' derived from any solution (j) to 
(jl.ip . we have 

(3.1) u^^^ = T^^^ v^^\ 

This claim follows from the fact that a Neumann boundary value problem on il^"^) has a unique 
solution. The operator T*^"^) is mathematically an integral operator called the Neumann-to-Dirichlet 
operator. 



Observation: The N2D operator is in general a complicated object. It has a singular kernel even 
for domains with smooth boundaries. When the domain boundary has corners, further complications 
arise. However, in our case, all such subtleties can be ignored since the boundary potentials of interest 
are all restrictions of functions that globally solve (jl.ip . and are in consequence smooth. 



The discrete analog of the equation (13. ip is 

(3.2) -uW 

For our purposes, it is sufficient for the matrix T*-"^-* to correctly construct it^^^ for any permissible 
vectors v^'^\ (By permissible, we mean that they are the restriction of a function in the solution set 
under consideration.) Written out in components, T^"^) is a 4 x 4 block matrix that satisfies 



(3.3) 



3.2. Construction of the N2D operator for a small box. Let Q^'^^ be a small box with edges 
p{«i)^ p{*2)^ p{«3)^ r^*''^ as shown in Figure[2l and consider the task of constructing a matrix T^"^) such 
that (13. 2p holds for all permissible potentials. We generate (by brute force) a collection of solutions 
that locally span the solution space to the desired precision. For each cpj, we construct the 
corresponding vectors of boundary values 
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- j(r,ll) 


















j(r,22) 


j{r,23) 


T(^'24) 










T(^.31) 




j{r,33) 






t;{''3) 








jirA2) 


T(^.43) 


j(r,44) 







0,(7(^2)) 

0,(7^)) 



and 



and then we construct via a least squares procedure a matrix T^"^) such that the equation 
(3.4) [ui U2 ■■■ UN,, ] = T(^) [Vi V2 



holds to within the specified tolerance e. 

The sample functions (pj are constructed by solving a set of local problems on a patch ^ that 
covers the domain as shown in Figure [5j The local problems read 



(3.5) 



-V{a{x)V(j)j{x)) + b{x) (pj{x) = 0, 

9„^(x) = Vj{x), 



X G^, 

X e d^, 



where d and b are functions chosen so that: 

(1) For X e 0^"^^ we have d{x) = a{x) and b{x 

(2) The equation (j3.5p is easy to solve. 



b{x). 



The collection of boundary data {vj}j^ 



'^^'^ is chosen so that the solution space is sufficiently "rich." 
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3.3. Assembling a global equilibrium equation. In this section, we will formulate a linear 
equation that relates the following variables: 

Given data: {v^^^ : i is an edge that is exterior to Vi}, 

Sought data: {■u'-*^ : i is an edge that is interior to 0}. 

Let A^edge denote the number of interior edges. Then the coefficient matrix of the linear system 
will consist of A'cdgc x -^cdge blocks, each of size A'gauss x A^gauss- Each block row in the system will 
have at most 7 non-zero blocks. To form this matrix, let i denote an interior edge. Suppose for 
a moment that i is a vertical edge. Let ri and T2 denote the two boxes that share the edge i, let 
{mi, m2, m3, 7714} denote the edges of ri, and let {ni, n2, ^3, 774} denote the edges of r2, see Figure 
[31 The N2D operator for ti provides an equation for the boundary fluxes of the left box: 

(3.6) = t(^1'2^) + T(^1'22) -y("^2) _^ j(ri,23) _^ j{ri,24) ^(7714)^ 

Analogously, the N2D operator for T2 provides the equation 

(3.7) u^ni) ^ j(r2,41) ^(m) _^ j(t2,42) ^(na) _^ j(t2,43) ^(ng) _^ j(r2,44) ^{ni) ^ 

Observing that m2 = 7^2 = i, we see that u^"^^^ = u^^^\ and consequently (j3.6p and (j3.7p can be 
combined to form the equation 

(3.8) t(^1'21) + t(^1'22) + t(^1'23) t;^'"^') + T^^^'^"^) i)^™^) 

= j(t2,41) ^(ni) _^ j(r2,42) ^(n2) _^ j{r2,43) ^(ns) _^ j(t2,44) ^(i). 



The collection of all equations of the form (|3.8p for interior vertical edges, along with the analogous 
set of equations for all interior horizontal edges forms the global equilibrium equation. 

4. Efficient direct solvers 

This section describes a direct solver for the global equilibrium equation constructed in Section [3l 
The idea is to partition the box J7 into a quad-tree of boxes, and then to construct the N2D operator 
T^"^^ for each box r in the tree. The first step is to loop over all leaf nodes of the tree and construct 
the N2D operator via the procedure described in Section 13. 2[ Then we execute an upwards sweep 
through the tree, where we construct the N2D operator for a box by merging the operators of its four 
children. For simplicity (and also computational efficiency) we execute each "merge-four" operation 
as a set of three "merge-two" operations. 

Let N denote the size of the coefficient matrix. Then Section 14.21 describes a procedure with 
0{N^'^) complexity, and Section 14.31 sketches out how the procedure can be accelerated to 0{N) 
complexity. Before describing the fast solvers, we describe a hierarchical decomposition of the domain 
in Section WA\ 

4.1. A quad-tree on the domain. A standard quad-tree is formed on the computational domain 
VL as follows: Let = be the root of the tree, as shown in Figure Ul^a). Then split fi^-*^^ into four 
disjoint boxes 

as shown in Figure IHb). Continue by splitting each of the four boxes into four smaller equisized 
boxes: 

n(5) = o(6)uo(^)uo(8)uo(9), 

0(6) = 17(10) u (7(11) U 0(12) u 0(13), 

o(^) = o(i'i)u 0(15) u 0(16) uo(i^), 

0(8) = 0(18) u 0(19) UO(20)u 0(21), 
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as shown in Figure |3)[c) . The process continues until each box is smah enough that the N2D operator 
for each leaf can easily be constructed via the procedure described in Section 13.21 The levels of the 
tree are ordered so that £ = is the coarsest level (consisting only of the root), i = 1 is the level 
with four boxes, etc. We let L denote the total number of levels in the tree. 

4.2. Simple construction of the N2D operator for a parent. Suppose that o" is a box with 
children z^i and as shown in Figure El and that we know the matrices T^^^^^ and T^'^^) associated 
with the children. We seek to construct the matrix T^""). The equilibrium equations for the two 
children read 

(4.1) = '^^"^'"^^ v^"''^ i = 1, 2, 3, 4, 

(4.2) = X^^^i T(^3,ij) ^K)^ i = l,2, 3, 4. 

Observing that li^"^^) — ^in„) combine (|4.ip for i = 2 with (j4.2p for i = 4 to obtain the joint 
equation 

(4.3) T^''!'^^) + T^''!'^^) l)^"'^) _^ j(i^l,23) ^(mg) _^ j{i^l,24) ^(1714) 
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Eliminating fj-gm the system via a Schur complement yields the operator T^'^) (upon suitable 

reblocking) . 

At this point, we have described a "merge-two" operation. A "merge-four" operation can of course 
be obtained by simply combining three merge-two operations. To be precise, suppose that r is a 
node with the four children ui, 1^2, 1^3, ^A- We introduce the two "intermediate" boxes cJi and ct2 as 
shown in the following figure: 





Z^4 







Letting the procedure described earlier in the section be denoted by "merge_two_horizontal" and 
defining an analogous function "merge_two_vertical" we then find that 

T('^i) = iiierge_two_horizontal(T(''i\ T^^^)), 
T('^2) = inerge_two_horizontal(T(''2)^ T^''^)), 

= iiierge_two_vertical(T('^i), T^''^)), 

4.3. Fast construction of the N2D operator for a parent. The merge operation described in 
Section 14.21 has asymptotic cost 0(iV^'^), where N is the total number of points on the edges of 
the leaves. To simplify slightly, the reason is that forming the "merge" operation requires matrix 
inversion and matrix-matrix-multiplications for of dense matrices whose size eventually grow to 
0{^fN) X 0{^/N). However, these matrices all have internal structure. To be precise, in ()3.3p . 
the off-diagonal blocks have low rank, and the diagonal blocks are all Hierarchically Semi-Separable 
(HSS) matrices. This means that accelerated matrix algebra can be used. For a matrix of size 
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N' X N', inversion can in fact be executed in 0{N') operations (provided that the "HSS-rank" is a 
fixed low number, which it is in this case). 

The acceleration procedure proposed here is analogous to the one described in [HO [3]. 
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Figure 1. The computational box (gray) is split into 16 small boxes. There are 
a total of 40 edges in the discretization, 24 interior ones (solid lines) and 16 exterior 
ones (dashed lines). One interior edge F^*) is marked with a bold line. (Each edge 
continues all the way to the corner, but has been drawn slightly forshortened for 
clarity.) 
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Figure 2. The box Q(^) is marked in gray. Its edges are F(*i), F^*^), r^^s), r^*^). 
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Figure 3. Construction of the equilibrium equation for the edge F^*^ in Fig- 
ure [1] It is the common edge of the boxes i}^'^'^^ and which have edges 
{r(™i), r(™2), F(™3)^ r('"4)}^ and {r("i), F("2), F^"^), F("4)}, respectively. Observe 
that F(*) = F('^2) = F("4) (the bold line). 
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(c) (d) 

Figure 4. Tree structure for a tree with L = 3 levels. There are 10 Gaussian nodes 
on each side of the leaf boxes. The black dots mark the points at which the solution 
(p and its derivative (in the direction normal to the indicated patch boundary) are 
tabulated. 
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Figure 5. Two choices of geometry for the local patch computation. The choice (a) 
is natural since it conforms to the overall geometry. The advantage of choice (b) is 
that the FFT can be used in the angular direction. 




Figure 6. Geometry of the merge operation. 
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1. Introduction 

1.1. Background. This note describes some tentative ideas for how to discretize and solve a class 
of variable-coefficient elliptic PDEs with smooth solutions. The ultimate goal is to device efficient 
methods for (soft) scattering problems in the plane, modeled by the Helmholtz' equation 

^2 

-A(/-(x) - — --5-0(2;) = 0, 

where c(x) is a smooth function that is constant outside some domain f], and the "loading" of the 
system is an incoming wave that satisfies the Helmholtz equation for the constant value of c outside 
^. 

The method described is high-order accurate and and leads to a linear system of algebraic equations 
that is very well-suited to "nested-dissection" type direct (as opposed to iterative) solvers. In a simple 
implementation, the resulting solver has 0{N^'^) complexity, where N is the total number of degrees 
of freedom in the discretization. We believe that the cost of the nested dissection step can be further 
reduced to 0{N) by exploiting techniques similar to those of [ying,grasendyck,xia]. 

In this initial work, we make several simplifying assumptions in order to investigate the basic 
viability of the method. The most significant simplification is that instead of studying the Helmholtz 
equation (which has oscillatory solutions), we study the modified Helmholtz equation (which has 
non-oscillatory solutions the decay exponentially fast). The remaining simplifications are, we believe, 
mostly cosmetic. 

1.2. Problem statement. We consider the equation 

^ ^^ ( -V{a{x)V(l){x)) + b{x)(t>{x) = 0, x e n, 

\ 4>n{x) = v{x), X ev, 

where Q is a box in the plane with boundary T = dil, and where (pn is the normal derivative of (j). 
We assume that the functions a and b are C°°, that 5 > 0, and that for every x £ Q, a{x) is positive 
definite. Under these assumptions, the solution (j) and its gradient Vcj) will be C°° in the interior of 
the domain and can to very high accuracy be specified via tabulation at Gaussian quadrature nodes. 
Near the boundary T, the question of smoothness in general gets complicated, but in this preliminary 
report we sidestep this issue by assuming that the given boundary data v is such that the solution 
(p is C°° on the closed domain $7. (Ultimately, the technique will be applied to scattering problems 
and the function v will be the restriction to F of the "incoming wave.") 

1.3. Outline of the discretization scheme. We propose to tessellate the computational domain 
into a large number of small squares and then use the fluxes across the boundaries of each square 
as the unknown variables in the model. The fiux is represented as a function along the edge; since 
this function is smooth, it can very accurately be represented by simply tabulating it at Gaussian 
nodes along the edge. We observe that if the fluxes through all four edges of a box are known, then 
the values of the potential (p on all of the edges can be constructed via the so called "Neumann-to- 
Dirichlet" (N2D) operator for the box. These operators can cheaply be constructed for all the boxes 
via a local computation. Once the N2D operators for all boxes are known, we construct for each 
edge an equilibrium equation by combining the N2D operators of the two boxes that share the edge. 
By combining the equilibrium equations for all interior edges, we obtain a global equation for all the 
interior boundary fluxes. Once this global equation has been solved, the potential on any box can 
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easily be reconstructed by solving a local Neumann problem on the box (since the boundary fluxes 
are now all known). 

1.4. Outline of the linear solver. The discretization described in Section 11.31 results in a large 
sparse linear system with a coefficient matrix A. If there are N^dge edges in the model, and we 
place A'gauss interpolation nodes at each edge, then A is a block matrix consisting of N^dgc x -^edge 
blocks, each of size A'^g 

auss X -^gauss- By Ordering the edges in a nested dissection fashion, fill-in can 
be limited in the factorization of A, resulting in an O(-^edgc) total cost for the initial solve. (Once 
one factorization has been executed, subsequent solves require 0{Ncdgc) operations.) 

The dominant cost in the factorization described above is the inversion or factorization of dense 

1/2 1/2 

matrices of size roughly N^^^^ x N^^^^ . These matrices have internal structure (they are so call Hier- 
archically Semi- Separable (HSS) matrices) which can be exploited to further reduce the complexity 

to O(iVedge). 

2. Discretization 

We tessellate Q into an array of small boxes, and let {r(*)}jg/j^_^^^^ denote the collection of edges of 
these boxes, see Figure [H We include both interior and exterior edges. For an edge i, we define u^^^ 
as the restriction of (j) to P'^*^: 

nW(x)=0(x), forxGF^^). 
Further, we define t)^*^ as the restriction of the normal derivative across F*^*): 

C [d24']{x) for X G F^*^ when F*-*^ is horizontal, 
v^''{x) = l 

[ [di(/)\{x) for X G F^*' when F*-*-^ is vertical, 
where we used the short-hand di = d/dxi. 

Observation: All the functions u^*-* and u*-*^ are smooth. They can to very high accuracy be specified 
by tabulating them at Gaussian points on the boundary and then interpolate between these points. 

On each line F^*^ we place A'gauss Gaussian nodes. These points are collected in vectors 7^*^ G 
]^iVgaussx2^ Then we form vectors w^*^?;'^*) G j^^gauss |-,y collocating the boundary functions tt*^*) and 
at the Gaussian nodes: 



3. The equilibrium equations 

3.1. Definition of the Neumann-to-Dirichlet operator. Let O^^) be a subdomain of f2 with 
edges F(^i), F^), F^), F^**), as shown in Figure El We define the boundary potentials and boundary 
fluxes for Q^'^^ via 
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(r) 



U 



in) 



and 



V 



in) 

yM 



Then there exists a unique operator T^'^^ such that with u^'^^ and v^'^^ derived from any solution (f) to 
(ll.lj) . we have 

(3.1) ?xM =r(")^;("). 



3 



This claim follows from the fact that a Neumann boundary value problem on Q^'^^ has a unique 
solution. The operator T^"^) is mathematically an integral operator called the Neumann-to-Dirichlet 
operator. 



Observation: The N2D operator is in general a complicated object. It has a singular kernel even 
for domains with smooth boundaries. When the domain boundary has corners, further complications 
arise. However, in our case, all such subtleties can be ignored since the boundary potentials of interest 
are all restrictions of functions that globally solve (II. Ih . and are in consequence smooth. 



The discrete analog of the equation (|3.ip is 

(3.2) -uM 

For our purposes, it is sufficient for the matrix T^"^) to correctly construct u^"^^ for any permissible 
vectors v^'^\ (By permissible, we mean that they are the restriction of a function in the solution set 
under consideration.) Written out in components, T^"^) is a 4 x 4 block matrix that satisfies 



(3.3) 



3.2. Construction of the N2D operator for a small box. Let Q^'^^ be a small box with edges 
p{«i)^ p{«2)^ p{«3)^ r^*''^ as shown in Figure^ and consider the task of constructing a matrix T^"^^ such 
that (13. 2p holds for all permissible potentials. We generate (by brute force) a collection of solutions 
jj^™"^ that locally span the solution space to the desired precision. For each (pj, we construct the 
corresponding vectors of boundary values 



- life) " 






Tfei2) 


Jir,l3) 






- i;fe) - 


life) 




Tfe2i) 


Tfe22) 


j{r,23) 


Tfe24) 




l,fe) 


life) 




Tfe3i) 




j{r,33) 


Tfe34) 




i;fe) 


ufe) 




Tfe41) 


jirA2) 


T{^.43) 


j(r,44) 




-yfe) 



0,(7fe)) 

0,-(7fe)) 
</,,(7fe)) 

0,(7(^4)) 



and 
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and then we construct via a least squares procedure a matrix T^'^^ such that the equation 

(3.4) [Ul U2 ■■■ lt7V,an.p] = T^^) [Vl V2 ■■■ -yjVsamp] 

holds to within the specified tolerance e. 

The sample functions are constructed by solving a set of local problems on a patch ^ that 
covers the domain as shown in Figure [5l The local problems read 



(3.5) 



-V(a(x)V(/>j(x)) + b{x) (j)j{x) 
dn(t){x) 



0, 

Vj{x), 



X G d^, 



where a and h are functions chosen so that: 

(1) For X G we have a{x) = a{x) and b{x) 

(2) The equation (j3.5p is easy to solve. 



h{x). 



The collection of boundary data {vj^^^^'^ is chosen so that the solution space is sufficiently "rich." 



samp 



3.3. Assembling a global equilibrium equation. In this section, we will formulate a linear 
equation that relates the following variables: 

Given data: : i is an edge that is exterior to 0}, 

Sought data: {?;^*^ : i is an edge that is interior to Vt}. 
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Let Nedge denote the number of interior edges. Then the coefficient matrix of the hnear system 
will consist of A'edge X -^edgc blocks, each of size A'^gauss x -^gauss- Each block row in the system will 
have at most 7 non-zero blocks. To form this matrix, let i denote an interior edge. Suppose for 
a moment that i is a vertical edge. Let ri and T2 denote the two boxes that share the edge i, let 
{mi, 7712, fn3, m^} denote the edges of ri, and let {ni, ?i2, 713, n^} denote the edges of T2, see Figure 
[3l The N2D operator for ri provides an equation for the boundary fluxes of the left box: 

(3,g) li^^^) ^ j(ti,21) ^{mi) _|_ j(ri,22) ^(012) _|_ j(ri,23) ^(m-s) _|_ j(ti,24) ^("14)^ 

Analogously, the N2D operator for T2 provides the equation 

(37) ^ j(t-2,41) ^(m) _^ j(t2,42) ^(n2) _^ j(-r2,43) ^(ng) _^ j(r2,44) 

Observing that m2 = n2 = i, we see that ^("^2) — ^t("4)^ g.^(^ consequently p.6p and (j3.7p can be 
combined to form the equation 

(3.8) T(^1'21) + T(^1'22) + jin,'^^) y{m3) _^ T(ri,24) ^(7714) 

= j(^2,41) ^(ni) _^ j(t2,42) ^(n2) _^ j(r2,43) ^(ns) _^ j(r2,44) 



The collection of all equations of the form (j3.8|) for interior vertical edges, along with the analogous 
set of equations for all interior horizontal edges forms the global equilibrium equation. 

4. Efficient direct solvers 

This section describes a direct solver for the global equilibrium equation constructed in Section O 
The idea is to partition the box Q into a quad-tree of boxes, and then to construct the N2D operator 
X('^) for each box r in the tree. The first step is to loop over all leaf nodes of the tree and construct 
the N2D operator via the procedure described in Section 13. 2[ Then we execute an upwards sweep 
through the tree, where we construct the N2D operator for a box by merging the operators of its four 
children. For simplicity (and also computational efficiency) we execute each "merge-four" operation 
as a set of three "merge-two" operations. 

Let N denote the size of the coefficient matrix. Then Section 14.21 describes a procedure with 
0{N^-^) complexity, and Section 14.31 sketches out how the procedure can be accelerated to 0{N) 
complexity. Before describing the fast solvers, we describe a hierarchical decomposition of the domain 
in Section STTJ 

4.1. A quad-tree on the domain. A standard quad-tree is formed on the computational domain 
as follows: Let O^^-* = be the root of the tree, as shown in Figure Ul^a). Then split ^l^^^ into four 
disjoint boxes 

as shown in Figure ID^b) . Continue by splitting each of the four boxes into four smaller equisized 
boxes: 

n(5) = o(6)uo(^)uo(8)uo(9), 

0(6) = 0(10)U 0(11) U 0(12) U 0(13), 

o(^) = 0(1-1) u 0(15) u 0(16) uo(i^), 
0(8) = 0(18) u 0(19) uo(20)u 0(21), 

as shown in Figure ID^c) . The process continues until each box is small enough that the N2D operator 
for each leaf can easily be constructed via the procedure described in Section 13.21 The levels of the 
tree are ordered so that ^ = is the coarsest level (consisting only of the root), i = 1 is the level 
with four boxes, etc. We let L denote the total number of levels in the tree. 
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4.2. Simple construction of the N2D operator for a parent. Suppose that a is a box with 
children ui and 1^3 as shown in Figure [6l and that we know the matrices T^^^^) and T^'^^) associated 
with the children. We seek to construct the matrix T^'^\ The equilibrium equations for the two 
children read 

(4.1) = ^^^1 T(^I'^J) -u^"^^), i = l,2, 3, 4, 

(4.2) = T^^'s-ij) i = l,2, 3, 4. 

Observing that ti^™^) — ^(n-n) -^g combine ()4.ip for z = 2 with (j4.2p for i = 4 to obtain the joint 
equation 

(4.3) T^''!'^^) t^^^i) + J^"^''^'^^ i;(™2) _^ j{i^i,23) ^(ma) _^ j{i^i,24) ^(^4) 

Utihzing further that i)^™^) = i,("4)^ ^j-j^g (j43|) along with (|4T]l and ([42]) as 
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Eliminating i)(''"2) from the system via a Schur complement yields the operator T^"^) (upon suitable 
reblocking) . 

At this point, we have described a "merge-two" operation. A "merge-four" operation can of course 
be obtained by simply combining three merge-two operations. To be precise, suppose that r is a 
node with the four children z^i, 1^2^ v^- We introduce the two "intermediate" boxes o\ and 02 as 
shown in the following figure: 

0-2 



V2 









Letting the procedure described earlier in the section be denoted by "merge_two_horizontal" and 
defining an analogous function "merge_two_vertical" we then find that 

T('^i) = merge_two_horizontal(T(''i), T^^^))^ 

T('^2) = merge_two_horizontal(T(''2)^ T^^^)), 
= nierge_two_vertical(T('^i), T(''2)), 

4.3. Fast construction of the N2D operator for a parent. The merge operation described in 
Section 14.21 has asymptotic cost 0{N^'^\ where is the total number of points on the edges of 
the leaves. To simplify slightly, the reason is that forming the "merge" operation requires matrix 
inversion and matrix-matrix-multiplications for of dense matrices whose size eventually grow to 
0(-v/iV) X 0(\/iV). However, these matrices all have internal structure. To be precise, in (j3.3p . 
the off-diagonal blocks have low rank, and the diagonal blocks are all Hierarchically Semi-Separable 
(HSS) matrices. This means that accelerated matrix algebra can be used. For a matrix of size 
N' X A^', inversion can in fact be executed in 0{N') operations (provided that the "HSS-rank" is a 
fixed low number, which it is in this case). 

The acceleration procedure proposed here is analogous to the one suggested in [Xia,Ying]. 



Figure 1. The computational box 0, (gray) is split into 16 small boxes. There are 
a total of 40 edges in the discretization, 24 interior ones (solid lines) and 16 exterior 
ones (dashed lines). One interior edge F^*) is marked with a bold line. (Each edge 
continues all the way to the corner, but has been drawn slightly forshortened for 
clarity.) 
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Figure 2. The box is marked in gray. Its edges are F(*i), F(*2), F^), F^**). 
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Figure 3. Construction of the equilibrium equation for the edge F^'^ in Fig- 
ure [H It is the common edge of the boxes 0^"^^^ and which have edges 
|r(mi)^ P(m2)^ Yima)^ Yi^i)^^ ^ud {F('^i), F^"^), F^"^), r^"^)}, respectively. Observe 
that F(*) = F('"2) = r("*) (the bold line). 
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Figure 4. Tree structure for a tree with L = 3 levels. There are 10 Gaussian nodes 
on each side of the leaf boxes. The black dots mark the points at which the solution 
(p and its derivative (in the direction normal to the indicated patch boundary) are 
tabulated. 
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Figure 5. Two choices of geometry for the local patch computation. The choice (a) 
is natural since it conforms to the overall geometry. The advantage of choice (b) is 
that the FFT can be used in the angular direction. 
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Figure 6. Geometry of the merge operation. 




Figure 7. Geometry of the method described in Section ??. The strip between the 
two squares has thickness 5. 



